The goal of this project is to explore some locational data with the folium library. Let's start off by importing folium for map creation and pandas for parsing a csv and creating a database.
import folium
import pandas as pd
For this project, I have chosen a dataset containing the vehicle crash data for the state of Maryland. Most of the data is centered around Baltimore, so we will focus on that region for now.
Now let's parse the file and create a database of crash data.
# Low memory is set to false to avoid warnings
crash_data = pd.read_csv("https://cmsc320.github.io/files/Baltimore_City_Vehicle_Crashes.csv", low_memory = False)
print(crash_data.columns)
crash_data
Index(['YEAR', 'QUARTER', 'LIGHT_DESC', 'LIGHT_CODE', 'COUNTY_NO', 'MUNI_DESC',
'MUNI_CODE', 'JUNCTION_DESC', 'JUNCTION_CODE', 'COLLISION_TYPE_DESC',
'COLLISION_TYPE_CODE', 'SURF_COND_DESC', 'SURF_COND_CODE', 'LANE_DESC',
'LANE_CODE', 'RD_COND_DESC', 'RD_COND_CODE', 'RD_DIV_DESC',
'RD_DIV_CODE', 'FIX_OBJ_DESC', 'FIX_OBJ_CODE', 'REPORT_NO',
'REPORT_TYPE', 'WEATHER_DESC', 'WEATHER_CODE', 'ACC_DATE', 'ACC_TIME',
'LOC_CODE', 'SIGNAL_FLAG_DESC', 'SIGNAL_FLAG', 'C_M_ZONE_FLAG',
'AGENCY_CODE', 'AREA_CODE', 'HARM_EVENT_DESC1', 'HARM_EVENT_CODE1',
'HARM_EVENT_DESC2', 'HARM_EVENT_CODE2', 'RTE_NO', 'ROUTE_TYPE_CODE',
'RTE_SUFFIX', 'LOG_MILE', 'LOGMILE_DIR_FLAG_DESC', 'LOGMILE_DIR_FLAG',
'MAINROAD_NAME', 'DISTANCE', 'FEET_MILES_FLAG_DESC', 'FEET_MILES_FLAG',
'DISTANCE_DIR_FLAG', 'REFERENCE_NO', 'REFERENCE_TYPE_CODE',
'REFERENCE_SUFFIX', 'REFERENCE_ROAD_NAME', 'LATITUDE', 'LONGITUDE',
'LOCATION'],
dtype='object')
| YEAR | QUARTER | LIGHT_DESC | LIGHT_CODE | COUNTY_NO | MUNI_DESC | MUNI_CODE | JUNCTION_DESC | JUNCTION_CODE | COLLISION_TYPE_DESC | ... | FEET_MILES_FLAG_DESC | FEET_MILES_FLAG | DISTANCE_DIR_FLAG | REFERENCE_NO | REFERENCE_TYPE_CODE | REFERENCE_SUFFIX | REFERENCE_ROAD_NAME | LATITUDE | LONGITUDE | LOCATION | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2020 | Q2 | NaN | 6.02 | 24.0 | NaN | NaN | Non Intersection | 1.0 | Other | ... | Miles | M | N | NaN | NaN | NaN | NORTH AVE | 39.311025 | -76.616429 | POINT (-76.616429453205 39.311024794431) |
| 1 | 2017 | Q2 | Daylight | 1.00 | 24.0 | NaN | NaN | NaN | NaN | Single Vehicle | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 39.282928 | -76.635215 | POINT (-76.6352150952347 39.2829284750108) |
| 2 | 2020 | Q2 | Daylight | 1.00 | 24.0 | NaN | NaN | Intersection | 2.0 | Other | ... | Feet | F | S | NaN | NaN | NaN | WINDSOR AVE | 39.312903 | -76.651472 | POINT (-76.651471912939 39.312903404529) |
| 3 | 2020 | Q2 | Daylight | 1.00 | 24.0 | NaN | NaN | NaN | 99.0 | Same Movement Angle | ... | NaN | U | E | NaN | NaN | NaN | WASHINGTON ST | 39.294944 | -76.599329 | POINT (-76.599328693204 39.294943770185) |
| 4 | 2020 | Q2 | NaN | 5.02 | 24.0 | NaN | NaN | NaN | NaN | Same Direction Rear End | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 39.296874 | -76.595871 | POINT (-76.595871121891 39.296873988072) |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 135585 | 2021 | Q4 | Unknown | 99.00 | 24.0 | NaN | NaN | Non Intersection | 1.0 | Same Direction Sideswipe | ... | Feet | F | E | NaN | NaN | NaN | LINDEN AVE. | 39.315169 | -76.635210 | POINT (-76.635209753929 39.315168933901) |
| 135586 | 2021 | Q4 | Daylight | 1.00 | 24.0 | NaN | NaN | Intersection | 2.0 | Single Vehicle | ... | Feet | F | N | NaN | NaN | NaN | N CALVERT ST | 39.291235 | -76.612507 | POINT (-76.61250671377 39.291234751225) |
| 135587 | 2021 | Q4 | Dark Lights On | 3.00 | 24.0 | NaN | NaN | Intersection | 2.0 | Same Direction Rear End | ... | Feet | F | S | NaN | NaN | NaN | E FAYETT | 39.290660 | -76.606670 | POINT (-76.60667 39.29066) |
| 135588 | 2021 | Q4 | Dark Lights On | 3.00 | 24.0 | NaN | 999.0 | Non Intersection | 1.0 | Same Direction Rear End Left Turn | ... | Feet | F | W | 2125.0 | MU | NaN | GARRISON BLVD | 39.343406 | -76.640647 | POINT (-76.640646523492 39.343405723453) |
| 135589 | 2021 | Q4 | Dark Lights On | 3.00 | 24.0 | NaN | NaN | Non Intersection | 1.0 | Same Direction Rear End | ... | Feet | F | N | NaN | NaN | NaN | CHARLES ST | 39.293148 | -76.609680 | POINT (-76.609679764014 39.293147795494) |
135590 rows × 55 columns
The first thing to note is that the dataframe has over 100,000 rows, and for the sake of performance, we cannot display them all on our map. So, instead of displaying all crashes, let's display the crashes for only a certain year.
It is also important to note that the data notes many different features of a crash such as light description, turn description, surface condition, and more.
Let's make sure that each row has a longitude, latitude, and year that is not null. Let's also make sure that the year is a numerical type
print(crash_data["YEAR"].value_counts())
print("---------------")
valid_latitudes = crash_data[pd.notnull(crash_data["LATITUDE"])]["LATITUDE"].count()
print("The number of valid latitudes is " + str(valid_latitudes))
valid_longitudes = crash_data[pd.notnull(crash_data["LONGITUDE"])]["LONGITUDE"].count()
print("The number of valid longitudes is " + str(valid_longitudes))
print("The length of the crash data is " + str(len(crash_data)))
2016 25763 2015 23682 2017 19220 2018 17490 2019 17020 2021 16858 2020 15557 Name: YEAR, dtype: int64 --------------- The number of valid latitudes is 135590 The number of valid longitudes is 135590 The length of the crash data is 135590
We can see that there are no null years and that the year is of type int64. Additionally, the number of rows in the dataframe which have a valid location are 135590 which matches the total number of rows in the dataframe.
Since the year 2020, has the least amount of rows, let's display the crash data for 2020. Now need to find a criteria by which we can separate the 2020 data. This allows us to display the different values in different colors on the map, so we can spot any locational trends that may appear.
We could separate the crashes by whether a signal flag is present near the crash. A signal flag is a flag which alerts drivers of nearby traffic conditions. Some things which could be writen on a signal flag are "Congestion Ahead" or "Construction Zone".
print(len(crash_data[crash_data["YEAR"] == 2020]))
signal_flag_data = crash_data[crash_data["YEAR"] == 2020]["SIGNAL_FLAG_DESC"]
print(signal_flag_data.count())
15557 15557
For the year 2020, there is a signal flag description for each crash
signal_flag_data.value_counts()
No 9537 Yes 6020 Name: SIGNAL_FLAG_DESC, dtype: int64
By looking at the number of crashes with and without a signal flag, we can see that about 40 percent of the crashes have a signal flag while 60 percent don't. While looking at the map data, we should keep in mind that there are more crashes without a signal flag.
Now let's create a folium map that is centered based on the mean of the location data in the dataframe.
map_osm = folium.Map(location=[crash_data["LATITUDE"].mean(), crash_data["LONGITUDE"].mean()], zoom_start=11,
width = 1000, height = 600)
map_osm
Let's now add all of our points to the map. The crashes in the year 2020 which had a signal flag will be displayed in the color blue while the points in the same year without a signal flag will be displayed in the color red. This allows us to spot any locational trends associated with the presence of a signal flag at a crash
# Created a new dataframe for only the year 2020 and iterated through its rows
filtered_data = crash_data[crash_data["YEAR"] == 2020]
for index, crash in filtered_data.iterrows():
# The folium points were added as a circle rather than a marker
# to improve performance
if crash["SIGNAL_FLAG_DESC"] == "Yes":
folium.Circle(location=[crash["LATITUDE"], crash["LONGITUDE"]],
color = "blue").add_to(map_osm)
elif crash["SIGNAL_FLAG_DESC"] == "No":
folium.Circle(location=[crash["LATITUDE"], crash["LONGITUDE"]],
color = "red").add_to(map_osm)
Now that all of the desired points have been added, let's display the map.
map_osm